Deep Recurrent Neural Network Based Monaural Speech Separation Using Recurrent Temporal Restricted Boltzmann Machines
نویسندگان
چکیده
This paper presents a single-channel speech separation method implemented with a deep recurrent neural network (DRNN) using recurrent temporal restricted Boltzmann machines (RTRBM). Although deep neural network (DNN) based speech separation (denoising task) methods perform quite well compared to the conventional statistical model based speech enhancement techniques, in DNN-based methods, the temporal correlations across speech frames are often ignored, resulting in loss of spectral detail in the reconstructed output speech. In order to alleviate this issue, one RTRBM is employed for modelling the acoustic features of input (mixture) signal and two RTRBMs are trained for the two training targets (source signals). Each RTRBM attempts to model the abstractions present in the training data at each time step as well as the temporal dependencies in the training data. The entire network (consisting of three RTRBMs and one recurrent neural network) can be fine-tuned by the joint optimization of the DRNN with an extra masking layer which enforces a reconstruction constraint. The proposed method has been evaluated on the IEEE corpus and TIMIT dataset for speech denoising task. Experimental results have established that the proposed approach outperforms NMF and conventional DNN and DRNN-based speech enhancement methods.
منابع مشابه
A Hierarchy of Recurrent Networks for Speech Recognition
Generative models for sequential data based on directed graphs of Restricted Boltzmann Machines (RBMs) are able to accurately model high dimensional sequences as recently shown. In these models, temporal dependencies in the input are discovered by either buffering previous visible variables or by recurrent connections of the hidden variables. Here we propose a modification of these models, the ...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملSinging-Voice Separation from Monaural Recordings using Deep Recurrent Neural Networks
Monaural source separation is important for many real world applications. It is challenging since only single channel information is available. In this paper, we explore using deep recurrent neural networks for singing voice separation from monaural recordings in a supervised setting. Deep recurrent neural networks with different temporal connections are explored. We propose jointly optimizing ...
متن کاملDevelopment of a Sound Coding Strategy based on a Deep Recurrent Neural Network for Monaural Source Separation in Cochlear Implants
The aim of this study is to investigate whether a source separation algorithm based on a deep recurrent neural network (DRNN) can provide a speech perception benefit for cochlear implant users when speech signals are mixed with another competing voice. The DRNN is based on an existing architecture that is used in combination with an extra masking layer for optimization. The approach has been ev...
متن کاملHigh-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion
This paper presents a voice conversion (VC) method that utilizes recently proposed recurrent temporal restricted Boltzmann machines (RTRBMs) for each speaker, with the goal of capturing high-order temporal dependencies in an acoustic sequence. Our algorithm starts from the separate training of two RTRBMs for a source and target speaker using speaker-dependent training data. Since each RTRBM att...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017